Bayesian generalization with circular consequential regions
نویسندگان
چکیده
Generalization – deciding whether to extend a property from one stimulus to another stimulus – is a fundamental problem faced by cognitive agents in many different settings. Shepard (1987) provided a mathematical analysis of generalization in terms of Bayesian inference over the regions of psychological space thatmight correspond to a givenproperty. Heproved that in the unidimensional case,where regions are intervals of the real line, generalization will be a negatively accelerated function of the distance between stimuli, such as an exponential function. These results have been extended to rectangular consequential regions in multiple dimensions, but not for circular consequential regions, which play an important role in explaining generalization for stimuli that are not represented in terms of separable dimensions. We analyze Bayesian generalization with circular consequential regions, providing bounds on the generalization function and proving that this function is negatively accelerated. © 2012 Elsevier Inc. All rights reserved. Generalizing a property from one stimulus to another is a fundamental problem in cognitive science. The problem arises in many forms across many different domains, from higher-level cognition (e.g., concept learning, Tenenbaum (2000)) to linguistics (e.g., word learning, Xu and Tenenbaum (2007)) to perception (e.g., color categorization, Kay andMcDaniel (1978)). The ability to generalize effectively is a hallmark of cognitive agents and seems to take a consistent form across domains and across species (Shepard, 1987). This consistency led Shepard (1987) to propose a ‘‘universal law’’ of generalization, arguing that the probability of generalizing a property decays exponentially as a function of the distance between two stimuli in psychological space. This argument was based on a mathematical analysis of generalization as Bayesian inference. Shepard’s (1987) analysis asserted that properties pick out regions in psychological space (‘‘consequential regions’’). Upon observing that a stimulus possesses a property, an agent makes an inference as to which consequential regions could correspond to that property. This is done by applying Bayes’ rule, yielding a posterior distribution over regions. The probability of generalizing to a new stimulus is computed by summing over all consequential regions that contain both the old and the new stimulus, weighted by their posterior probability. Shepard gave analytical results for generalization along a single dimension, where consequential regions correspond to intervals of the real line, proving that ∗ Correspondence to: University of California, Berkeley, Department of Psychology, 3210 Tolman Hall # 1650, Berkeley, CA 94720-1650, United States. E-mail addresses: [email protected] (T.L. Griffiths), [email protected] (J.L. Austerweil). 0022-2496/$ – see front matter© 2012 Elsevier Inc. All rights reserved. doi:10.1016/j.jmp.2012.07.002 generalization should be a negatively accelerated function of distance, such as an exponential. He also simulated results for generalization in two dimensions, examining how the pattern of generalization related to the choice of consequential regions. The resulting model explains generalization behavior as optimal statistical inference according to a probabilistic model – a rational analysis of generalization (Anderson, 1990; Chater & Oaksford, 1999) – and is one of the most important precursors of the recent surge of interest in Bayesian models of cognition, which include extensions of the Bayesian generalization framework beyond spatial representations (Navarro, Dry, & Lee, 2012; Tenenbaum & Griffiths, 2001). One of the valuable insights yielded by Shepard’s (1987) analysis was that different patterns of generalization could be captured by making different assumptions about consequential regions. People use two different kinds of metrics when forming generalizations about multi-dimensional stimuli: separable dimensions are associated with exponential decay in ‘‘city-block’’ distance or the L1 metric, while integral dimensions are associated with exponential decay in Euclidean distance or the L2 metric (Garner, 1974). These differentmetrics also have consequences beyond generalization behavior, influencing how people categorize objects varying along different dimensions (Handel & Imai, 1972) and whether people can selectively attend to each dimension (Garner & Felfoldy, 1970). Additionally, there is evidence that people can learnwhichmetric they should use for generalization based on concept learning (Austerweil & Griffiths, 2010). In the Bayesian generalization model, the difference between separable and integral dimensions emerges as the result of probabilistic inference with different kinds of consequential regions (Davidenko & Tenenbaum, 2001; Shepard, 1987, 1991). 282 T.L. Griffiths, J.L. Austerweil / Journal of Mathematical Psychology 56 (2012) 281–285 When consequential regions are aligned with the axes of the space, such as rectangles or ellipses that have their major axes parallel to the dimensions in which stimuli are expressed, a pattern of generalization similar to that seen for separable dimensions emerges. When consequential regions are indifferent to the axes of the space, such as circles or randomly-oriented rectangles or ellipses, a pattern of generalization similar to that seen with integral dimensions appears. Shepard (1987) noted: ‘‘For stimuli, like colors, that differ along dimensions that do not correspond to uniquely defined independent variables in the world, moreover, psychological space should have no preferred axes. The consequential region is thenmost reasonably assumed to be circular or, whatever other shapes may be assumed, to have all possible orientations in the spacewith equal probability’’ (p. 1322). Despite the importance of considering different kinds of consequential regions in multidimensional spaces to Shepard’s (1987) theory, the result that the generalization function should be negatively accelerated was only proved in the unidimensional case. Subsequent analyses have shown that negatively accelerated functions can be obtained with rectangular consequential regions (Myung & Shepard, 1996; Tenenbaum, 1999b,a) and generalized the argument to discrete representations (Austerweil & Griffiths, 2010; Chater & Vitanyi, 2003; Russell, 1986; Tenenbaum & Griffiths, 2001). However, the case of circular consequential regions – which are particularly important for representing integral dimensions, as noted above – has not been investigated in detail. In this article, we derive bounds and prove that the function produced by Bayesian generalization with multidimensional circular consequential regions is negatively accelerated, extending Shepard’s original result to this multidimensional case. The strategy behind our analysis is as follows. We begin by formulating the problem of generalization as Bayesian inference for an unknown consequential region. Next, we reparameterize the problem to allow us to simplify the probability of generalizing to a new stimulus to the integral of a simple function. Unfortunately the integral has no known closed form solution, leading us to attack it in two ways. First, we derive bounds on the integral that approximate the true solution. Second, we prove through analysis of the derivatives of the integral that the solution to the integral is convex and must be monotonically decaying in the Euclidean distance between the two stimuli. 1. Problem formulation Assume that an observation x is drawn from a circular consequential region in R2. Then we have p(x|c, s) = 1 πs ∥x − c∥2 ≤ s 0 otherwise (1) where c is the center of the consequential region, with s the square of its radius. We can then consider the set of all possible consequential regions from which the observation might have been drawn, which is here the set of all possible circles, and use Bayes’ rule to calculate the probability of that consequential region given the observation of x. Specifically, we have p(h|x) = p(x|h)p(h) p(x) (2) where h is somehypothetical consequential region, here consisting of a pair c, s. To evaluate the denominator, we simply compute h∈H p(x|h)p(h)dh, where H is the set of all hypotheses under consideration, here being all pairs c, s. From this we can obtain Fig. 1. Parameterization used to compute P(y ∈ C |x). the probability that some other point y is in the true consequential region from which xwas drawn
منابع مشابه
Generalizing Generalization1 Generalization, Similarity, and Bayesian Inference
Shepard's theoretical analysis of generalization, originally formulated only for the ideal case of encountering a single consequential stimulus that can be represented as a point in a continuous metric space, is here recast in a more general Bayesian framework. This formulation naturally extends to the more realistic situation of generalizing from multiple consequential stimuli with arbitrary r...
متن کاملSpherical Units as Dynamic Consequential Regions: Implications for Attention, Competition and Categorization
Spherical Units can be used to construct dynamic reconfigurable consequential regions, the geometric bases for Shepard's (1987) theory of stimulus generalization in animals and humans. We derive from Shepard's (1987) generalization theory a particular multi-layer network with dynamic (centers and radii) spherical regions which possesses a specific mass function (Cauchy). This learning model gen...
متن کاملSpherical Units as Dynamic Consequential Regions
Spherical Units can be used to construct dynamic reconfigurable consequential regions, the geometric bases for Shepard's (1987) theory of stimulus generalization in animals and humans. We derive from Shepard's (1987) generalization theory a particular multi-layer network with dynamic (centers and radii) spherical regions which possesses a specific mass function (Cauchy). This learning model gen...
متن کاملGeneralization, similarity, and Bayesian inference.
Shepard has argued that a universal law should govern generalization across different domains of perception and cognition, as well as across organisms from different species or even different planets. Starting with some basic assumptions about natural kinds, he derived an exponential decay function as the form of the universal generalization gradient, which accords strikingly well with a wide r...
متن کاملExplaining compound generalization in associative and causal learning through rational principles of dimensional generalization.
How do we apply learning from one situation to a similar, but not identical, situation? The principles governing the extent to which animals and humans generalize what they have learned about certain stimuli to novel compounds containing those stimuli vary depending on a number of factors. Perhaps the best studied among these factors is the type of stimuli used to generate compounds. One promin...
متن کامل